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Summary. Characteristic scale is a notion that pervades the geophysical sciences, but it has 
no widely accepted precise definition. The wavelet transform decomposes a time series into 
coefficients that are associated with different scales. The variance of these coefficients can 
be used to decompose the variance of the time series across different scales. A practical 
[j2 • definition for characteristic scale can be formulated in terms of peaks in plots of the wavelet 

variance versus scale. This paper presents basic theory for characteristic scales based upon 
the discrete wavelet transform, proposes a natural estimator for these scales and provides a 
large sample theory for this estimator that permits the construction of confidence intervals for 
a true unknown characteristic scale. Computer experiments are presented that demonstrate 

c/3 . the efficacy of the large sample theory for finite sample sizes. Examples of characteristic 

scale estimation are given for global temperature records, coherent structures in river flows, 
the Madden-Julian oscillation in an atmospheric time series and transects of one type of Arctic 

^ . sea ice. 

^ . 

1 . Introduction 

Time series in geophysics and other areas often seem to be describable as a series of 'states' 
L^ , or 'events' whose durations tend to cluster around a value known as a characteristic scale. 

^— V ' Although the notion of characteristic scale is widespread in the physical sciences, it does 

not have a precise definition independent of summary statistics that have been proposed to 
extract it from particular time series. Several definitions for characteristic scale are discussed 
in von Storch and Zwiers (1999) for time series that can be modeled as a stochastic process 
^^ ' Xt, t G Z (the set of all integers). One definition involves quantifying the 'memory' of the 

process. Suppose that P[Xt+r > | X4 > 0] > 0.5 for small lags r, but P[Xt+T > 0\Xt > 
0] = 0.5 at large lags. The smallest r such that the latter relationship holds is one way to 
define a characteristic scale. Although this definition has some intuitive appeal because it 
is based on the length of time that a process takes to 'forget' its current positive state, von 
Storch and Zwiers note that it is of limited practical value; for example, it leads to an infinite 
characteristic scale for a first-order autoregressive (AR(1)) process, one of the most popular 
models in time series analysis. Under the additional assumption that Xt is a wide-sense 
stationary process with variance cr^, a more useful definition compares Xt to a process Yt 
consisting of independent and identically distributed random variables, also with variance 
(7^. The sample mean of Yi, I2, ■ • • , ^w has variance u^/iV, whereas that for Xi, X2, . . . , X^ 
can be expressed as a'^/N' , where N' is referred to as the equivalent sample size. The limit 
of the ratio N/N' as A^ — ?> 00 defines a quantity td known as the decorrelation time. As 
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von Storch and Zwiers argue, td is a reasonable definition for characteristic scale for some 
- but not all - time series. The practicality of this measure is partly due to its relationship 
with the autocorrelation sequence (ACS) pk for Xt, namely, 

oo 
TD = l + 2"^Pk. (1) 

fc=l 

Appropriate estimates of pk can thus be used as the basis for an estimate of td] for an 
AR(1) process, we have rp = (1 + pi) /{I — pi). More generally, von Storch and Zwiers note 
that the variances of statistics other than the sample mean can be used in a similar manner 
to define other measures of characteristic scale. Other approaches for defining characteristic 
scale have been discussed in the literature. Four examples are Simonetti et al. (1985), who 
cast the definition in terms of the structure function (basically a reformulation of the ACS); 
Cordes (1986), who uses the shape of the ACS; Higuchi (1988), who links characteristic 
scale to a measure of fractal dimension; and Tsonis et al. (1998), who define the concept 
in terms of fiuctuations from cumulative sums in combination with dctrcnding via singular 
spectrum analysis (see Section 6.1 for details). 

In this article we propose a new definition for characteristic scale based upon the discrete 
wavelet transform (DWT) of Xt . The DWT is often described as a scale-based transform 
(see, e.g., Flandrin, 1999, Percival and Walden, 2000, and Nason, 2008). To fix ideas, let us 
focus on the Haar DWT. This transform yields wavelet coefficients, say VF^j"", that reflect 
changes in adjacent averages spanning integer scales r at times indexed by t; to be precise, 

T-l -, T-1 



r -^-^ T ^-^ 

1=0 i=a 



If the 'events' in a time series have a characteristic duration of t, then iVF^j"! will tend to be 
large at certain indices t. Under the assumption that Xt is a stationary process, a summary 
of the 'largeness' of | W^^'"'! across t is provided by a time-independent quantity var {W^j'"} 
known as the wavelet variance. The wavelet variance provides a scale-based decomposition 
of the variance of Xt (see Section 2.1 for details), so a large var{PF"j"} for a particular r 
should provide the basis for defining a characteristic scale that is in the neighborhood of r. 
The goal of this article is to expand this key idea to define a wavelet-based characteristic 
scale and to provide theory for a corresponding statistically tractable estimator. 

The remainder of this paper is organized as follows. Section 2 gives some necessary 
background on the DWT, the wavelet variance and its sampling theory for intrinsically sta- 
tionary Gaussian processes. Section 3 proposes a wavelet-based definition of characteristic 
scale and contrasts it with tjj . Section 4 deals with estimation of the wavelet-based charac- 
teristic scale and provides some large-sample theory for the proposed estimator. Section 5 
reports on a Monte Carlo study that examines the efficacy of the large-sample theory for a 
representative selection of processes and finite sample sizes. Section 6 gives four examples of 
estimating characteristic scales for actual time series. The final section (7) has a summary 
and a discussion of possible extensions. 

2. Background on the Wavelet Variance 

Here we define the wavelet variance, give an interpretation for it and present formulae that 
allows its computation, after which we review its estimation theory. 



ASSESSING CHARACTERISTIC SCALES USING WAVELETS 3 

2. 1. Definition and Basic Properties of the Wavelet Variance 

Let {Xt} be an intrinsically stationary process of integer order d > 0, defined as follows. 
If d = 0, {Xt} itself is stationary in the sense that both E{Xt} and cov {Xt+r,Xt} exist, 
are finite and are independent of t; li d > 0, then subjecting {Xt} to a dth order backward 
difference filter yields a stationary process, namely, 

whereas {X^ },..., {X^ } and {X^ } = {Xt} are all nonstationary. Under the assump- 
tion that {Xl } has a spectral density function (SDF) denoted by Sx(d) (•), the (generalized) 
SDF for {Xt} is defined to be 

[4sm (tt/jJ" 

where 4 sin (tt/) defines the squared gain function for a first-order backward difference filter 
(Yaglom, 1958). We denote the autocovariance sequence (ACVS) for {X^ '} by {si- }. 

Let {/ii,z : Z = 0, 1, . . . , ii — 1} be a unit-level Daubechies wavelet filter of width Li = 
2,4,6,... normalized such that ^[ii^i = 1/2 (Daubechies, 1992). If Li > 4, use of this 
filter is equivalent to subjecting the output from a backward difference filter of order d — 
Li/2 to a low-pass filter of width Li/2 (the case Li = 2 yields the Haar wavelet filter, 
whose coefficients {5,-5} are proportional to a first-order backward difference filter). Let 
gi^i = (—lY'^^hiXi-i-i be the corresponding scaling filter. Let 

Li-l 

1=0 

define the transfer function for the wavelet filter, and let Gi{f) denote the same for the 
scaling filter. For a level j > 2, let 

i-2 

Jf,(/)^i?l(2■'"-^/)[]Gl(2^/). 

fc=0 

The inverse Fourier transform of this function gives the impulse response sequence for the 
jth level wavelet filter {Iijj : I ^ 0, . . . Lj - 1}, where L^ = (2-' - l)(Li - 1) + 1. We 
denote the corresponding squared gain function by V-jif) = \Hj{f)\^. The filter {/ij,;} is 
approximately a bandpass filter with a passband given by |/| G (1/2^+^, 1/2-']. 
The jth level wavelet coefficient process for {Xt} is given by 

1=0 

The coefficient Wj^t is proportional to changes in adjacent weighted averages with an effec- 
tive scale (or span) of t, = 2^^^. Note that scale tj is associated with the frequency interval 
(1/(4tj), l/(2r,)] and the interval of periods [2tj,4tj). Under the assumptions that {Xt} 
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is an intrinsically stationary process of order d with SDF Sx{-) and that Li > 2d, {Wj,t} 
is a stationary process with SDF given by 

[4sin [nj)\^ 
By definition the wavelet variance for {Xt} at scale tj is the variance of {Wj,t}: 

i.| ^ var {W,,t} - / S,if) df^f n,if)Sxif) df. 

J-l/2 J-1/2 



If {Xt} is a stationary process, then 



var{Xa = ^ 



-I 



(Percival, 1995), and the wavelet variance for scale tj can be interpreted as the contribution 
to the overall variance due to changes in adjacent weighted averages over that scale (if {Xt} 
is nonstationary, the summation above diverges to infinity, but v"^ still has the interpretation 
of measuring the variability of changes in adjacent weighted averages). 

The theoretical wavelet variance for an intrinsically stationary process can be computed 
readily in terms of the ACVS {si- } for its underlying stationary component. In particular, 
we can write 

Lj—d—l Lj—d—1 Lj—d—l—T 

^-r E (^so +2 5: .(^) j: b^b^,^, 

1 = T=l 1=0 

where {b, ^ } is the dth-order cumulative summation of {hj^i}; i.e., with b- ( = hj^i, we have, 

for k — 1, . . . ,d, 

I 



^S=EC^ /=^0,l,...,L,-fc-l 



n=0 



(Lemma 1, Craigmile and Percival, 2005). Using {b- ^ }, we can write 



1=0 



Denote the transfer function and squared gain function for {b, {} as 

J z_/ J, J J [4sm \Trj)\ 

Then we can have 

Sj{f) = Bf\f)Sxi.^ (,/) and hence v^^ = ( Bf\f)Sxi.^ (,/) df. 

J-1/2 
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2.2. Estimation Theory for the Wavelet Variance 

Given a time series that can be regarded as a realization of a portion Xq, Xi, . . . , X^v-i of 
length N of the process {^t}, we can compute the level j wavelet coefficients for indices 
Lj — l<t<N— I under the assumption that Mj = N — Lj + 1 > 0. A sufficient (but 
not necessary) condition for {Wj,t} to be a zero mean stationary process is that Li > 2d (if 
Li = 2d, then {VFj.t} is necessarily stationary, but it might not have zero mean). Assuming 
that Li is chosen such that {W}.t} is a zero mean stationary process, we have 1''^ — E{Wff} 
and hence 

1 '^-^ 

v]^ir y wi (2) 



is an unbiased estimator of the wavelet variance. 

To look at the second moment properties of v'j , we assume that the Wj^t obey a mul- 
tivariate Gaussian distribution. Using the Isserlis theorem (Isserlis, 1918) and assuming 
j < k, we find that variance and covariance of j)^ and i)^ are given by 

J T=-(Mfc-l) \ *«/ 3 *= t=Lj-lM=Lfc-l 

where {sj,fc,T} is the cross-covariance sequence for the bivariate stationary processes {VFj^t} 
and {Wk,t}- 



Sj^k^T ^ cov {Wj^t+T,Wk^t} ^ E ^jV E ^fc'"™* 



(d) Ad) 
m r — l+ra 
1=0 m=0 



(when using (3) to compute var{j)^} by letting k — j, the double summation is interpreted 
as zero). 

While (3) is an exact result, it is of interest to explore an approximation to cov {;>? i>|} 
that leads to a practical scheme for estimating it. As TV — >■ oo and hence Mk -^ oo 
also, the double summation in (3) becomes negligible, whereas the first summation can be 
approximated using a Cesaro sum argument, which, followed by an appeal to Parseval's 
theorem, yields 



2 °° 2A /"^/^ 

covii^lul} ^^ Yl 4fc,r = -^^ where A,^k ^ / S,{f)Sk{.f)df. (4) 
^*^i ,_ „ ^^^j J -1/2 



"1/2 
^J.— CO " '''1 -^-1/2 

Suppose that Sj{f), < / < 1/2, is some standard nonparametric SDF estimator whose 
large-sample distribution is dictated by Sj{f)x'^/Vj where xi is a chi-square random variable 
with 7] degrees of freedom (see, e.g., Priestley, 1981). Letting Sk{f) be a similar estimator 
for Sk{f), it follows from standard theory for multivariate SDF estimation (Priestley, 1981) 
that 

E{S,if)Sk{f)}^S,{f)Sk{f)(j^ + l 

and hence that 

Aj,k ^TT- I S,{f)Sk{f)df 
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is an approximately unbiased estimator of Aj^k- Specializing to the case where Sj{f) and 
Skif) are lag window estimators based upon lag windows {wj_t} and {wk.r} (Priestley, 
1981), we have 

^].k = r, I '^]'^k + 2 X! ^J-rSj.TWk,TSk.T , (5) 

"' + '' V r = l / 

where {sj^r} is the biased estimator of the ACVS for Wj^l-^i, . . . , Wj^^^i: 






t=Lj-l 

A practical scheme for approximating cov{j>? j>^} is to substitute Aj,k for Aj^k in (4). 

Finally, we note that, under a Gaussian assumption on {Xt} and mild conditions on its 
SDF, i)^ is asymptotically normally distributed with mean v^ and large sample variance 
2Aj^j/Mj (Percival, 1995; Mondal, 2007; see Serrouk et al., 2000, for related resuhs that 
relax the Gaussian assumption). 

3. Wavelet-Based Definition of Characteristic Scale 

Because we can interpret the wavelet variance at a particular scale tj as the contribution to 
the overall variance oi{Xt] due to changes in adjacent weighted averages over that scale, we 
can formulate a wavelet-based notion of characteristic scale by searching for scales at which 
v^ is large compared to its surrounding values, thus leading to the following definitions. 

Definition 1. (Local characteristic scale Tcj.) Suppose {Xt} is an intrinsically sta- 
tionary process such that v^ > v^^i for some j > 2, with strict inequality holding in 
at least one case. Fit a quadratic yk = a + bxk + cx^ that passes through {xk,yk) = 
(log2(Tfc),log2(i^^)), k — j — ^,j,j + 1. ^ local characteristic scale is the location at which 
the quadratic is maximized: 

T,,, = 2-''/(2-) = 2-^i/^V„ where (3^ = ■^'+' ^ ^^'^ and /?2 = y,+i - 2y, + y,_i. (6) 

We note in passing that Tj/^/2 < Tcj < Tjy/2 and that the pattern i^J^i < v^ — vj^-^ > f|_|_2 
yields Tc,j = Tcj+i = Tj^2. 

Definition 2. (Global characteristic scale Tc.) Suppose {Xt} has a local characteristic 
scale Tcj such that v"^ > vf, for all k G Z excluding k = j — l,j,j + 1, where Z is the set 
of positive integers. Then {Xt} is said to have a global characteristic scale Tc = Tc.j . 

Figure 1 shows four examples of theoretical wavelet variance curves with local characteristic 
scales. If we assume that the wavelet variances at scales not depicted in the plots are all 
smaller than the ones shown, the processes associated with (a), (b) and (d) have a global 
characteristic scale, but the one for (c) does not. 

Our definitions for Tcj and Tc need some justification. Arguably a more natural def- 
inition for characteristic scale would involve a wavelet variance defined over a continuum 
of scales via a continuous wavelet transform (CWT). A local maxima of a CWT-based 
wavelet variance curve would then define a local characteristic scale, which would seem to 
be preferable to our interpolation scheme based on just the dyadic scales Tj = 2^^^. The 
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Fig. 1. Log of wavelet variances f^ versus log of Tj (circles) for (a) a first-order autoregressive 
(AR(1 )) process, (b) a linear combination of an AR(1 ) process and a fractionally differenced process; 
(c) a linear combination of an AR{1) process and white noise; and (d) a linear combination of two 
AR(1) processes (see Section 5 for precise definitions of all four processes). The vertical dashed 
lines indicate the locations the characteristic scales Tcj, while the gray curves show the quadratic fit 
whose maximum location determines Tc ,. 



following example suggests the CWT- and DWT-based definitions are quite similar, which 
leads us to prefer the latter because it is much easier to compute and because its estimator 
is statistically tractable. Consider an AR(1) process Xt = <j)Xt_i + et, where {et} is a 
Gaussian white noise process with zero mean and unit variance. For the Haar DWT, this 
process has a global characteristic scale when 0.57 < (f> < 1. The pluses in Fig. 2 show how 
Tc increases as (j) varies from 0.60 to 0.99 in steps of 0.01. Avoiding interpolation issues that 
arise in using a CWT with a time series sampled over the integers, we can readily extend 
the definition of the Haar wavelet variance to all scales t G Z^ : 



Hr) = 



4^2 ' 



A plot of j^^(t) versus r for = 0.60,0.61, ..., 0.99, shows a unique maximum, which 
provides us with a CWT-like definition of characteristic scale fc € Z^. The circles in Fig. 2 
show fc versus (f). The agreement between Tc and fc is very good and gets better as (j) 
increases. By contrast, if we were to define Tc in terms of a quadratic fit in linear/linear 
rather than log/log space, we obtain the asterisks shown in the figure, which do not agree 
nearly as well with the CWT- based definition fc. Use of linear/log space or log/linear space 
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Fig. 2. Comparison of three wavelet-based measures of characteristic scale for an AR(1) process 
with a unit-lag autocorrelation of (p. The pluses show Tc, which is based on a quadratic fit in log/log 
space around the peak in the Haar wavelet variance curve evaluated over dyadic scales Tj = 2^"\ 
The asterisks show a similar measure, but now based on a quadratic fit in linear/linear space. The 
circles are based on a measure given by the location of the peak of the Haar wavelet variance curve 
evaluated over all integer-valued scales. 



yields characteristic scales that are nearly identical to those obtained using, respectively, 
linear/linear space or log/log space. The choice of log/log space over log/linear space 
is dictated by the fact that the log transform acts as a variance-stabilizing transform for 
wavelet variance estimators (this can be seen further on by noting that, while the elements of 
El in Equation (7) depend on the wavelet variance, this quantity ratios out in the elements 
of E2 in Equation (8)). 

For a stationary process {Xt} with ACS {pk}, it is of interest to compare Tc to the 
measure of characteristic scale provided by the decorrelation time of Equation (1). For an 
AR(1) process, pk = f/'''^' decays exponentially, and we have td = (1 + 0)/(l — 0). Figure 3 
shows td versus the Haar-based Tc as ranges over 0.60, 0.61, . . . , 0.99. The two measures 
track each other as (j) gets large, with td = I.OItc for (p ~ 0.99. Figure 1(a) shows the 
Haar wavelet variance versus Tj when (j) = 0.7, for which Tc = 4.53. By contrast. Fig. 1(b) 
shows a similar wavelet variance curve for a process that is a linear combination of an 
AR(1) process with (j) = 0.75 and a fractionally differenced (FD) process with long-memory 
parameter 5 = 0.45 (Granger and Joyeux, 1980; Hosking, 1981; Beran, 1984). The two 
processes are independent of each other. Here Tc = 5.87, whereas td is infinite because 
Pk decays hyperbolically due to the influence of the FD process. This fact points out a 
fundamental difference between the measures Tc and tu: whereas the former concentrates 
on localized properties of {Xt}, the latter is influenced to a large degree by the asymptotic 
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Fig. 3. Decorrelation time td versus characteristic scale Tc for AR(1) processes with unit-lag auto- 
correlations of = 0.60, 0.61, ..., 0.99 (circles from left to right). If m and Tc had been equal, the 
circles would have fallen on the dashed line. 



decay rate of the ACS (and cannot be used at all with intrinsically stationary processes 
with d> 0). 



4. Statistical Properties of Chiaracteristic Scale Estimators 

Suppose we have a time series that can be regarded as a realization of a portion Xq, Xi, 
. . . , Xjv_i of an intrinsically stationary process of order d, based upon which we want to 
estimate local characteristic scales (assuming such exist) for {Xt}. We start by estimating 
the wavelet variance out to some maximum scale of interest tj^^ using the unbiased estimator 
Jo. If there is some 1 < j < Jo such that the estimates obey the pattern 
> J^l-i-x (with strict inequality holding in at least one case), we can define an estimator 



-hj = ^^- 

^..2 



Tcj of the characteristic scale in the region of r, by replacing yk with y^ = log2(!^^) in 
equation (6): 



'c,3 



2-^^/P^Tj, where /3] 



_ %■+! - Vl-l 



and $2 = Vj+i - '^Vj + Vj^ 



We want to establish simple - but reasonable - approximations to the sampling properties 
of this estimator under the assumption that N is large. 

Motivated by the large-sample theory reviewed at the end of Section 2.2, we start with 
the assumption that [i^?-ii ^'?i i^J+i] is multivariate Gaussian with a mean given by the 

true wavelet variances W'j-i^'^'j ,'^'j+i\ a-nd a covariance matrix dictated by equation (3). 
We use equation (4) to approximate this symmetric matrix by 



Si = 2 



^j-i,j-i/Mj-i 



A 



j-i,i+i 



/M, 






A 



j+ij+i 



/M 



i+i 



(7) 
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The delta method says that [logs i^j-i) > log2 (i^J ) . log2 (^|+i)] == [yj-i^yj^Vj+if is ap- 
proximately Gaussian with mean [logs i'^j-i) jl^gs (t'?) , logs ('^l+i)] = [?/j-ii J/ji J/j+i] 
and covariance m.atrix Es whose elem.ents are 






with TO and n = 1, 2 and 3. Since 



var {i>|_2+m} var {i>|_2+n} + (cov {i>2_ 



j-2+m' ''j-2+n 



i^7 



J)^ 



4 4 

^j-2+m^j"-2+ 



„logl2) 



(8) 



/9i 

/32 



-i i 

2^2 

1 -2 1 







= H 


yj-i 

V3 




. yj+i . 




[ Vj+i J 



it follows that [/3i,/32] is approximately Gaussian with mean [/3i,/32] and covariance 
HY,2H^ ■ Let k = — /3i//32. Further applications of the delta method say that k is ap- 
proximately Gaussian with mean —fi\llii and variance given approximately by 



ol 



var{/3i} , /32var{/32} , var {A} var {/32} + 2(cov{/3i,/32})2 



/?? 



/3| 



2 >^2 

2/',,„^ rfl i\2 



/3| 



3/?2(var{/32})2 2/3iCOv{/3i,/32} 



/3f 



/3| 



(9) 



and that 



var {? 



'cj}«<,'T^l0g:(2). 

We can now provide, for example, an approximate 95% confidence interval (CI) [i_,L+] 
for k; i.e., 

P[L_ < K < L+] « 0.95, where L± = k ± 1.96crK. 

The event -L_ < k < i+ is equivalent to the event tj2 
mate 95% CI for t^.j is given by [2~^' 



L_ 



< T, 



cj 



< ^.2^+, 



so an approxi- 



'CJ7 ■ 



'CJJ 



In practical applications, we can 
from (2) for v^ 



estimate a\ in a 'plug-in' manner by using Aj ^ from (5) for A^ ^ in (7), 

in (8), and /3i and /32 for /3i and /32 in (9). 

A caveat about our approach is that it is conditioned upon the observed pattern j>^ > 
^?±i correctly indicating the presence of a local characteristic scale somewhere in the vicinity 
of r, . As iV — > oo, observed patterns wih agree better and better with true patterns because 
of the asymptotic properties of the wavelet variance estimators, but observed patterns might 
be deceptive for finite sample sizes. A sanity check that sheds some light on the validity 
of an observed pattern is to generate a large number of independent realizations from a 
trivariate Gaussian distribution with mean vector [j>? j^, j>? i>?_^]^] and covariance matrix 

dictated by (7) with Aj^k replaced by Aj.fe of equation (5). If the proportion of realizations 
that have a maximum in the same location as the observed pattern is large, then we have 
some reassurance that the observed pattern is faithfully mimicking the true pattern (see 
Section 6.3 for an example of this procedure). 



5. Monte Carlo Experiments 

We consider the following four zero-mean Gaussian stationary processes, whose wavelet 
variances are depicted in Fig. 1: 
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Table 1 . Results from Monte Carlo experiments (see text for 
details). 



process 


iV 


Tc 


Tc 


M 


% coverage 


(a) 


512 


4.53 


4.69 


992 


88.2 




2048 




4.66 


1000 


87.1 




8192 




4.57 


1000 


94.4 


(b) 


512 


5.87 


6.16 


981 


89.2 




2048 




5.84 


1000 


90.5 




8192 




5.83 


1000 


93.8 


(c) 


512 


30.42 


33.51 


838 


86.8 




2048 




32.49 


964 


88.5 




8192 




31.41 


1000 


93.1 


(d) 


512 


3.76 


4.00 


971 


91.0 




2048 




3.84 


999 


96.7 




8192 




3.76 


1000 


96.1 


(d) 


512 


122.96 


89.31 


454 


86.8 




2048 




149.60 


703 


88.9 




8192 




153.43 


775 


86.7 



(a) an AR(1) process with a variance of 4 and a unit-lag autocorrelation of </) ~ 0.7; 

(b) a process given by ^Xt + -j^Yt, where {Xt\ is an AR(1) process with ~ 0.75, while 

0.45; 

= 0.95, while {Yt} is a white 



—n^^t -^ -72'Yt, where {Xt} is an AR(1) process with 



{Yt} is an FD process with long- memory parameter S 

noise process; and 
(d) /oXt + -j^Yt, where {Xt} is an AR(1) process with 
process with (j) — 0.99. 



0.65, while {Yt} is a similar 



For creating the last three processes, {Xt} and {Yt} are unit variance Gaussian processes 
such that Xs and Yt are independent for all s and t. 

For each process and for samples size N — 512, 2048 and 8192, we generated 1000 
realizations using an exact simulation method for AR processes (Kay, 1981) and for FD 
processes (Davies and Harte, 1987; Craigmile, 2003). We recorded the number of replica- 
tions M for which there was a peak in the Haar wavelet variance curve at either the proper 
level j or levels j ± 1. For each of these M realizations, we estimated the characteristic scale 
and computed a 95% CI using the plug-in procedure described above (the estimates Aj^^ 
were formed using periodograms, which are a special case of a lag window estimator with 
Wj^T = 1 for all T and with -q ~2). Table 1 shows the average of the estimated characteristic 
scales and the percentage of times that the 95% CIs trapped the true characteristic scale. 
There is a tendency for Tc to be positively biased. The closeness of coverage percentage 
to the nominal 95% tends to depend upon the true Tc'- the smaller Tc is, the better the 
coverage rate. For small sample sizes, the coverage rate tends to be below the nominal 
95%. The coverage rates tend to improve with increasing sample size, as asymptotic theory 
would suggest. These experiments show that the large-sample theory gives useful - but 
admittedly not perfect - approximations to the variability in Tc for moderate sample sizes. 
(Similar results were obtained using Daubechies wavelet filters of lengths Li — 4: and 8.) 
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Fig. 4. Four time series (upper four plots) and one derived series (bottom plot). The series are 
(a) monthly global temperature anomalies; (b) coherent structures in river flows; (c) 200-hPa velocity 
potential anomalies in the atmosphere; (d') Arctic sea ice thickness; and (d) indicator series for 
medium multiyear Arctic sea ice. 
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6. Real-World Examples 

Here we consider four examples of characteristic scale estimation based upon actual time 
series and the Haar wavelet. 



6. 1. Global Temperature Record 

Figure 4(a) shows a time series of length N — 1560 of monthly global temperature anoma- 
lies (land and ocean combined) from January 1880 up to December 2009 (data obtained 
from http://www.ncdc.noaa.gov/pub/data/ajiomalies/). The series appears to have a 
prominent trend upwards. Tsonis et al. (1998) analyzed a closely related series by forming 
cumulative sums and considering standard deviations of lagged differences of these sums as 
a function of lag. Because trends adversely affect this method, they elected to detrend the 
series using the first three empirical orthogonal functions extracted from a singular spec- 
trum analysis (Eisner and Tsonis, 1996). Subtraction of these functions from the time series 
yielded a residual series that was subjected to a test for non-zero trend using a Cox-Stuart 
nonparametric test (Cox and Stuart, 1955). Since the test failed to reject the null hypothe- 
sis of no trend, they used the residuals to estimate a characteristic scale, obtaining a value 
of about 20 months. This scale was the value at which the curve of standard deviations 
versus lag exhibited a change in slope in log/log space. They interpreted this characteristic 
scale as being due to the influence of El Niiio/La Nifia cycles on global temperatures. 

Figure 5(a) shows that the Haar wavelet variance curve for this series has a peak at scale 
T5 ~ 16 months. The corresponding estimated characteristic scale is fc,5 = 14.9 months, 
with an associated 95% CI of [9.6, 23.0]. This interval traps the nominal characteristic scale 
found by Tsonis et al. (1998). There is, however, some cause for concern here due to the 
apparent trend in this time series. If we model the series as Xt = a + bt + Yt , where a + bt 
describes a linear (first-order polynomial) trend and {Yt} is a zero mean stationary process, 
then the Haar wavelet coefficient process {W'j,*} is stationary, but has a nonzero mean, 
which is in conflict with the zero mean assumption used to construct D'^ of Equation (2). 
The wavelet coefficients most influenced by a linear trend are those at the largest scales, 
which explains the upward pattern in the wavelet variance curve of Fig. 5(a) at those scales. 
Since there is some concern that the wavelet variance estimates at smaller scales might also 
be adversely affected by the trend, we also considered a wavelet variance curve constructed 
using a Daubechies wavelet filter of width Li — 8 (the so-called 'least asymmetric' filter). 
This filter is capable of completely eliminating a trend that is well-approximated by a 
third-order polynomial because of its embedded backward difference filter of order d — 4 
(Craigmile et al., 2004). The wavelet variance curve for this filter also exhibits a peak at 
scale Tr,. The corresponding estimated characteristic scale is f^^s = 16.0 months, with an 
associated 95% CI of [11.1, 23.3], all of which are in reasonable agreement with the results 
from the Haar wavelet. This example points out that, because of differencing operations 
embedded within wavelet filters, there is no need to detrend a time series prior to a wavelet 
analysis if care is taken in selecting a wavelet filter of appropriate length Li to handle the 
nature of the apparent trend in the series. 



6.2. Coherent Structures in River Flow 

Figure 4(b) shows a time series capturing so-called 'coherent structures' (such as boils or 
eddies) in river flows (Chickadel et al., 2009; data courtesy of Alex Horner-Devine and 
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Fig. 5. Log of wavelet variances u^ versus log of r^ (circles) for (a) monthly global temperature 
anomalies; (b) coherent structures in river flows; (c) 200-hPa velocity potential anomalies; and 
(d) medium multiyear Arctic sea ice. The vertical dashed lines indicate the locations the estimated 
characteristic scales fcj, while the gray curves show the quadratic fit whose maximum location de- 
termines Tc,j. The horizontal solid lines depict 95% confidence intervals for the true characteristic 
scales. 



Bronwyn Hayworth, Department of Civil and Environmental Engineering, University of 
Washington). The 2048 values shown in the plot are from a longer series of length N = 29972 
that has a sampling interval of A = 1/25 sec and spans a little less than 20 min (the subscrics 
in the plot is from the first 82 sec). This time series is derived from measurements from 
three transducers and a velocity profiler set on the bottom of the Snohomish River Estuary 
in Washington State immediately downstream of a sill pointing upwards. The structures 
are essentially quasi-periodic upwellings from the river that appear as temporary 'blobs' 
on the surface of the river. Each blob dissipates within a second or so, and then another 
blob forms sometimes later. As the tide increases, the water velocity increases, and the 
frequency at which the blobs occur appears to increase. 

Videos of the river surface clearly show these boils qualitatively, but quantifying this 
little- understood phenomenon using standard Fourier-based spectral analysis is problematic 
because it appears as a small perturbation in a low- frequency rolloff. By contrast, the scale- 
based analysis afforded by the wavelet variance (Fig. 5(b)) clearly displays a peak in its 
decomposition of the sample variance, rendering the phenomenon as interpretable in terms 
of a characteristic scale. The estimated standardized characteristic scale is fcfi = 41.1, which 
corresponds to a physical characteristic scale of fcfi A — 1.6 sec, with an associated 95% 
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CI of [1.4, 1.9] sec. The time-evolving properties of the boils can be studied by estimating 
characteristic scales for time series spanning successive 20-minutc time intervals. 



6.3. Madden-Julian Oscillation (MJO) 

Figure 4(c) shows the first 2048 values from a time series of 200-hPa velocity potential 
anomalies equatorward of latitude 30°N and at longitude 80°E (one of a number of daily 
MJO indices available from http : //www . cpc. ncep . noaa . gov/) . The entire series has 2354 
values covering 3 January 1978 to 29 March 2010 with a sampling interval of A = 5 days. 
This series is one manifestation of the MJO, which Madden and Julian (1994) define as a 
40-50 day oscillation appearing in various atmospheric time series collected in the tropics. 
The periods associated with the MJO have been revised since 1994 based upon subse- 
quent analysis of additional time series - the MJO is now sometimes called a 30-60 day 
or intraseasonal oscillation. The wavelet variance plot for the velocity potential anomalies 
(Fig. 5(c)) has a local peak at scale T2, with associated standardized local characteristic 
scale fc,2 = 2.53, which converts into a physical scale of fc^2 A = 12.7 days. The associated 
95% CI is [11.9,13.5] days. Since a physical scale of r A is associated with the interval 
of periods [2r A,4tA], the point estimate fc_2 A matches up with 25-51 day oscillations 
and hence with the description of the MJO as a 30-60 day oscillation. A difficulty with 
using Fourier-based spectral analysis to deduce the MJO is the lack of a standard way to 
determine the beginning and end of the frequency interval associated with this broad-band 
oscillation. The notion of a characteristic scale bypasses this difficulty and opens up a means 
of objectively tracking how the MJO varies across time and over different time series. 

In addition to the peak at T2, there is a second one at ry, which leads to an estimated 
physical characteristic scale of fc,7 A ^ 304 days and an associated 95% CI of [79,1170]. 
The interval of periods associated with Tc,7 is [609, 1217] days, so this local characteristic 
scale suggests an oscillation spanning two to three years that is about 5 times weaker than 
the MJO. The presence of this weak additional oscillation is conditional upon the peak 
pattern in the wavelet variance estimates being correct. As an example of the reality check 
described at the end of Section 4, we generated 100, 000 realization from a trivariate normal 
distribution with mean [i>|, i>|, i)!]-^ and covariance dictated by Equation (7), with Aj^k 
estimated per Equation (5). Of these realizations, 60% obeyed the observed v\ <v^ > v^ 
pattern, but the remaining 40% did not, casting considerable doubt on the validity of the 
observed peak pattern. (A similar test on the MJO peak at T2 yielded 99, 916 realizations 
with the observed peak pattern and only 84 without.) 



6.4. Medium Multiyear Arctic Sea Ice 

Figure 4(d') shows 2048 measurements of ice thickness taken at 1 m spacings along a 
transect near the North Pole in April of 1991. The entire set of data consists of iV = 49, 998 
measurements extending over 50 km and was collected by a U.S. Naval submarine with an 
upward-looking sonar (the data are archived by the National Snow and Ice Data Center at 
http : //nsidc . org/). We can regard these data as a time series with a spacing of A = 1 m, 
where here 'time' is a surrogate for distance along the submarine's path under the ice (the 
submarine was moving in the same direction at a constant speed as much as possible, and 
the data were recorded at regular intervals of time) . 

Researchers classify sea ice by thickness, with different ice types thought to be driven 
by different physical processes (Flato, 1995, and World Meteorological Organization, 2007). 
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One such type is called medium multiyear ice and has a thickness from 2 to 5 m. The 
horizontal dashed lines on Fig. 4(d') demark this ice type. Figure 4(d) shows a binary-valued 
times series indicating the absence or presence (using or 1) of this ice type. Figure 5(d) 
shows the Haar wavelet variance curve for this indicator series. The curve exhibits a single 
broad peak at scale tt, leading to an characteristic scale is Tcj = 48.9 m, with an associated 
95% CI of [29.6, 80.7] m. We can regard this characteristic scale as an indicator of the 
'typical' extent of medium multiyear ice. In the face of other evidence that the Arctic 
climate is dramatically changing, a question of considerable geophysical interest is how 
stable the characteristic scales for different ice types are both spatially and temporally. 
Because submarines have collected data on sea-ice thickness throughout the Arctic region 
since 1958, it is possible to look at temporal and spatial variations in estimated characteristic 
scales and to use the methodology developed in this paper as one way to assess changes in 
Arctic climatology over the past 50 years. 

Finally we note that there is evidence of long-range dependence in series of ice thickness 
measurements (Percival et al., 2008). This type of dependence maps over into indicator 
series, which means that the characteristic scale td of Equation (1) would be infinite for 
the medium multiyear ice indicators. By contract, the wavelet-based characteristic scale is 
finite and provides a useful summary of one aspect of the indicator series. 



7. Summary and Discussion 

We have proposed a new definition for the characteristic scale of a time series that can be 
modeled as an intrinsically stationary process. The definition is based upon local peaks 
in a plot of the wavelet variance versus scale. Since the wavelet variance provides a scale- 
based decomposition of the process variance, a characteristic scale corresponds to one that 
is contributing more to the overall variance than scales surrounding it. This wavelet-based 
definition of characteristic scale has certain advantages over other definitions, including 
abilities to (1) focus on localized properties of the process rather than asymptotic decay 
rates of autocorrelation sequences, (2) handle certain nonstationary processes and (3) handle 
series with trends that are well approximated by a low-order polynomial. We have developed 
a large-sample theory for an estimator of the wavelet-based characteristic scale, and we have 
demonstrated the use of this theory through Monte Carlo experiments and applications to 
four representative real-world time series. 

There are several avenues of research that are outside the scope of this article, of which we 
mention three of particular interest. First, the basis for our large-sample statistical theory 
for the characteristic scale estimator is that the underlying wavelet coefficient processes are 
Gaussian. This assumption does not automatically rule out the usefulness of our theory 
for non- Gaussian processes. The filtering that is required to generate wavelet coefficients 
produces a central limit effect. Thus, even if a process is non-Gaussian, its associated wavelet 
coefficient processes might be well approximated as Gaussian, particularly at large scales. 
The indicator series for medium multiyear Arctic sea ice considered in Section 6.4 is an 
example of a non-Gaussian series whose large-scale wavelet coefficients are markedly closer 
in distribution to Gaussian than the original series. While limited tests to date indicate 
that the Gaussian-based large-sample theory for Tc.j is reasonably valid for indicator scries, 
there is certainly room for additional research that examines the question of non-Gaussianity 
more thoroughly. 

Second, wc have assumed our time series to be regularly sampled, but irregularly sampled 
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series often occur in practice. The simplest form of irregularity is missing observations in 
what would otherwise be a regularly sampled series. Mondal and Percival (2010) present 
statistical theory for an estimator of the wavelet variance that works with gappy time 
series. This theory can presumably be used as the basis for a characteristic scale estimator 
for gappy time series. A more serious challenge is to provide a corresponding theory for 
time series sampled at irregular patterns. 

A third avenue for additional research is to handle two-dimensional data (collected on 
a regular grid) that display a degree of characteristic bumpiness. Gilgai patterns in the 
Bland Plain of New South Wales, Australia, provide an example of this type of data. Milne 
et al. (2010) have analyzed these patterns using a two-dimensional version of the wavelet 
variance. The notion of a characteristic scale for a two-dimensional isotropic field could be 
the basis for an interesting complementary analysis of these data. 
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